Goto

Collaborating Authors

 memory complexity


Supplementary Materials A Complexity Analysis

Neural Information Processing Systems

Our proposed method significantly reduces communication overhead in federated learning. This method poses a trade-off between time and memory complexity. We also provide detailed information about the optimization hyperparameters e.g. In this section, we explore the effect of fitness sparsification i.e. selecting top-k fitness values from the To enable a fair and insightful comparison between the two population sizes, our focus was on assessing performance based on the number of members remaining post-sparsification rather than directly contrasting sparsification rates. Our results underline the crucial role that population size plays in exploring optimal solutions, overshadowing even the significance of compression rate.




AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM

Huang, Haoyu, Tsang, Hong Ting, Bai, Jiaxin, Peng, Xi, Zhang, Gong, Song, Yangqiu

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) has shown some success in augmenting large language models (LLMs) with external knowledge. However, as a non-parametric knowledge integration paradigm for LLMs, RAG methods heavily rely on external retrieval modules and the retrieved textual context prior. Especially for very large scale knowledge augmentation, they would introduce substantial inference latency due to expensive searches and much longer relevant context. In this paper, we propose a parametric knowledge integration method, called \textbf{AtlasKV}, a scalable, effective, and general way to augment LLMs with billion-scale knowledge graphs (KGs) (e.g. 1B triples) using very little GPU memory cost (e.g. less than 20GB VRAM). In AtlasKV, we introduce KG2KV and HiKVP to integrate KG triples into LLMs at scale with sub-linear time and memory complexity. It maintains strong knowledge grounding and generalization performance using the LLMs' inherent attention mechanism, and requires no external retrievers, long context priors, or retraining when adapting to new knowledge.





172ef5a94b4dd0aa120c6878fc29f70c-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all reviewers for their valuable feedback. We believe our results make a significant contribution to the field of theoretical reinforcement learning. Therefore, analyzing a variant of Nash Q-learning may be of independent interest. Since NE always exists, CCE always exists, i.e., the set of linear constraints are always feasible. The "hat" version is the actual certified policy (which can be executed as in Algorithm 2 and 4).